AM  ITEK  ANALYSIS  OF  AH  OBJECTIVE  TEST  IH  BIOLOGY 


by 


CKI.HKRT  AIXKH  HSWBBRIIY 


n*  ^^.,  Fort  Haya  Kansaa  stata  ColXaga,  1939 


A  THESIS 


aubmlttad  In  partial  fulfillment  of  the 


reqiilrenianta  for  the  degraa  of 


MA3TSR  OP  SCIEROB 


Department  of  Edncatlon  and  Payohology 


KAKSAS  STATE  COIXEOE 
OF  ACKICtJLTURE  AND  AFi>LIED  SCZZIOK 


1947 


TABLE  OP  COHTEHTS 


11 


IHTBOIWCTIOS       .       •       . 

1 

RKVIBW  OF  LITERATTOK 

4 

DESCRIPTION  OF  PBOCKDUFi 

13 

Material      *      .      . 

IS 

Ballablllty          .       . 

*       14 

Validity            •       « 

>       .       IS 

Comparison  of  Itam  Validities  • 

20 

Order  of  Dlffloulty   .      » 

26 

Choice  of  Reaponaea   .      . 

>       .       29 

Revlaed  Teat  of  Sixty  Items   . 

42 

SDMSARY  ART)  COHCLUSIOHS    ,       ,       , 

>       .       44 

ACEnCWLECGKEKT            ,       «       i 

>      •      4fl 

LITERATURE  CITED          ,       ,       , 

>       .       47 

lOTRODOCTIOM 

Tcatlng  at  the  b«glnnlag  of  a  ocnu>8*  la  laportant  In  helping 
the  teaeher  to  edjuat  hla  teaohlng  to  the  needa  and  eduoetlonal 
level  of  hla  atudenta.  AohleTement  may  be  measured  either  with 
reference  to  an  arbitrary  atandard  of  what  a  ntudant  ahoold  be 
like  at  the  end  of  a  course  or  with  reference  to  what  he  waa  like 
at  the  beginning  of  the  course  and  his  pro;  reas  since  that  tine* 
The  uae  of  a  standardised  teat  at  the  end  of  a  course  la  helpful 
In  eatlsMtlng  the  extent  to  which  the  objectlvea  of  the  oourae 
have  been  aohlevedt  In  oaaea  where  a  atudant  requeata  advanced 
credit  In  a  oourae,  a  standardltad  teat  la  the  beat  Instrunent 
for  oeoqsarlng  hla  achlevenent  with  that  of  students  who  have  oceh> 
pleted  the  course. 

The  USAPI  Testa  of  Oenoral  Educational  Development  are  en 
outatandlng  example  of  standardised  teata  for  uae  at  both  tlw 
high  aohool  and  college  levela.  Their  major  purposes  (26)  ere 
to  provide  a  baala  for  vocational  and  educational  f:^ldanoe  for 
veterans,  to  aaalat  BChools  In  the  plaoement  of  returning  veter- 
ans and  to  help  schools  determine  the  amount  of  academic  credit 
to  be  granted  for  educational  experiences  In  mllltery  service. 
Tyler  (86)  Hate  three  typea  of  opportunities  for  educational  ex- 
perlencea  In  mllltorj  service.  They  are  military  training,  the 
off-duty  educational  program  and  Informal  experlencea.  After 
world  War  T,  many  educational  Institutions  Granted  blanket  credit 
for  military  aervloe  with  unaatlsfactory  reaulta  In  many  caeea. 


To  avoid  ainilar  roaulta  aftar  World  War  IT,  e  speelal  oonailtta* 
eallad  togathar  tjy  tha  \aMrloan  Counell  on  Eduoatlon  deoldad  that 
•  unlfona  ayatam  of  teatlng  to  damonatrata  oompetenoa  should  ba 
davalopad  to  aid  tha  aohooXa  In  handling  raquaota  for  advanoad 
credit*  Braciley  (3)  focmd  a  oorralatlon  of  *66  batwaan  cvi*' 
point  avaragaa  In  social  atudlas  and  aeoraa  on  tha  GKD  tast  In 
Tntarpratatlon  of  Raadln^;  Matarlals  In  Social  Studlaa,  but  found 
no  stgnlf leant  ralatlonshlp  between  teat  sooraa  and  mmibar  of 
hours  of  oollaga  credit  oompletad  In  tha  field*  Bausuae  they  are 
teats  of  general  educational  develo^anent  rather  than  echleveaent 
tests  for  specific  ooursea  they  do  not  fill  the  need  for  standard- 
ised teats  adapted  for  use  «lth  particular  oolle(;e  courses* 

The  developnent  of  college  level  achievement  testa  has  been 
encouraged  by  the  Botanical  !^oolaty  of  Anerloa  through  the  «ox4i 
of  Ita  oonnlttee  on  the  teaching  of  botany  In  Anierloan  oollegea 
and  universities  (5«  14).  They  emphasised  the  point  that  a 
valid  aohlevoBient  teat  should  measure  more  than  the  studanta* 
memory  for  facts.  The  ability  to  apply  the  facta  learned  Is  also 
essential.  The  objectives  of  the  course  should  be  clarified  and 
tha  test  constructed  to  meaiiure  the  extent  to  whlt^h  tha  student 
responds  In  the  desired  way  In  view  of  all  the  objectlvea  of  the 
course. 

At  Kansaa  i^tate  College,  Dr.  B.  J.  Herbaugh,  Profeaaor  of 
?.ooloey,  constructed  a  100  Item  objective  teat  In  biology  which 
he  de.olrea  to  atendardlse*  Thla  test  was  given  to  atxidenta  en- 
rolled In  tha  courae.  Biology  in  Relation  to  Man,  In  September, 
1343.  At  the  conclusion  of  the  two  aemeater  oourae,  the 


t«8t  w«a  odnlnl8ter«d  <>gain  to  the  auM  atudants. 

Bcfora  th«  Talldatlon  and  standardlBatlcm  of  tha  taat  could 
ba  oomplatadf  an  Itam  anaXyala  was  naoaaaar^  to  datarmlna  «hara<* 
In  It  eonld  ba  ravlaad  and  Improved*  ;!uoh  an  ttan  analyals  la 
tha  problan  of  this  thaala* 

In  the  davelopment  of  thin  problaa  tha  following  aapaota 
wara  studied i 

1.  An  Itan  anelysla  was  made  to  datamtlne  tha  validity  and 
difficulty  of  each  lt«m*  The  relatlonahlp  between  Iten  diffi- 
culty and  item  validity  was  investigated* 

8«  Tha  validity  of  the  teat  was  datenalned  by  oorrelatloB 
of  the  total  teat  seorea  with  gradea  for  the  two  seneatara  In 
the  oourae«  niology  In  Relation  to  Man* 

S*  Tha  reliability  of  the  teat  was  determined  by  the  Kuder- 
Rlohardaon  fonmila. 

4*  The  60  Itens  with  validity  ooefflolenta  of  «80  or  higher 
were  aalected  and  reliability  and  validity  ooefflolenta  were  con* 
puted  for  this  30  Iten  test* 

5*  An  analysis  was  node  of  the  choices  of  anivwers  to  all 
questions  aa  s  basla  for  the  revision  of  the  Items  to  secure 
frraater  validity* 

6*  Beoosasendatlons  wexHi  made  for  the  revlalon  of  the  teat* 


KKVIEW  OF  UTtRATUKB 

Itaa  aiMlyBlt  Involves  th«  two  i;eneral  problems  of  tt«M 
validity  «nd  itoo  dlffloulty. 

A  study  of  lt«K  validity  deals  *lth  tha  dlagnostlo  value 
of  each  Itais  for  predlctlne  a  criterion.  Hullford  (10)  haa  ax- 
preeaed  the  purpose  tlms  "to  be  dlaj^&oetio  of  any  tr«lt«  an  Iten 
Buat  enable  us  to  distinguish  betaeen  Individuals  aho  have  arare 
or  leaa  of  that  trait."  If  the  oriterlon  nooras  of  Indlvlduala 
who  paaa  an  Itea  are  not  sl(rnlflcantly  different  from  the  erl- 
terlon  scores  of  those  who  fall  to  pass  that  iteR*  the  Item  does 
not  contribute  to  the  aeastirement  of  the  trait  of  which  the  orl- 
terlon  is  the  standard* 

In  determining  item  validity^  juat  aa  In  aetenalning  the 
validity  of  a  total  teat,  the  oholoe  of  a  orltarlon  la  of 
priB»ry  inportanoa.  The  two  typea  of  eritarla  with  which  teat 
ttema  aay  bo  correlated  are  an  Independent  criterion  suoh  aa  ia 
used  in  valldotlng  a  total  teat  and  the  criterion  of  Internal 
conalateney,  ''wlneford  (2?)  haa  shown  that  statistical  aethoda 
of  iten-crlterlon  correlation  are  equally  applicable  to  inde* 
pendent  and  Internal  criteria  ao  tha  criterion  ohoaen  ahould  be 
the  one  whleh  iMst  nearly  repreaenta  the  trait  to  be  aeaaured. 

Internal  oonalatenoy,  or  the  correlation  of  the  itea  with 
the  total  teat  score,  la  the  moat  widely  uaed  oriterlon  for  the 
validation  of  teat  items.  Owons  (ao)  has  orltlclaed  the  aethod 
on  the  baaia  that  it  iMy  result  in  the  narrowing-  of  the  test  so 
that  the  validity  of  the  total  teat  nay  be  deoreaaed.  .'^wlneford 


(8S)  pointed  out  that  in  •on*  oaaaa  th«  total  test  soor*  !■  tha 
baat  aaaaura  avstlabla  of  tha  trait  to  b«  testad  and  tharafore 
an  Internal  criterion  may  be  superior  to  an  Indepandant  criterion. 
Sha  also  stated  that  in  any  oaae  where  s  teat  »a«  known  to  be 
valid.  Item  validity  oould  be  satlsfnotorlly  determined  by  the 
correlation  of  the  Item  with  the  total  test  score,  Guilford  (10) 
recognised  both  methods  as  aoeeptable  altho^t^h  he  cautioned 
against  too  (-rest  narrowing  of  the  teat  by  the  method  of  In- 
ternal conalstanoy  and  warned  that  an  Independent  orlterton  muat 
be  ohoiien  with  oar*  beoause  of  tha  dlffloulty  of  finding  an 
adequate  criterion) 

Tntemsl  eonslstenoy  was  ehoaen  aa  th*  criterion  for  use  In 
the  Item  analysis  of  tha  biology  teat* 

There  are  numerous  statistical  methoda  for  detandnlng  tha 
oerralatlon  of  an  Item  with  a  criterion.  Tertaln  atandard 
technics  sueh  as  th*  blaerlel  r,  the  tetraohorle  r  and  tha  phi 
coefficient  are  rccognicad  and  daaorlbed  in  the  statistical 
literature.  Other  methods^  desorlbeJ  in  the  professional  .leor- 
nals,  have  been  developed  aa  praotleol  short  cuts  to  obtain 
approximately  the  aaa*  reault  with  a  siapllflcatlon  of  tha 
■athod  of  cceiputatlon. 

Tha  blserlal  coefficient  of  correlation  la  E;enerally  racog- 
nts*d  aa  one  of  the  moat  accurate  methoda  of  determining  it*a 
validity  beoauae  each  criterion  soor*  is  i^lven  due  weight  without 
ohange  of  value  by  grouping  into  oatagorlea.  The  formula  for 
tha  bl aerial  r  as  given  by  Guilford  (10)  iat 


^bi  •  "g  •  S    X  -23- 

where  Mp  •  the  matin  criterion  score  of  the  group  paaslng  the  Itea 
Uq  »  the  mean  criterion  aoore  of  the  group  falllag  thfe  Iteii 
p  s  the  proportion  of  oasee  In  the  higher  group 
q  ■  the  proportion  of  oaaes  in  the  lover  group 
J    s  ^^   ordinate  of  the  nojimal  distribution  curve  with  aur* 
faoe  equal  to  1*00(  at  the  point  of  division  between 
the  aegawnta  oontalnlng  p  and  q  proportions  of  eaaea 
0^  ■  the  standard  deviation  of  the  total  saaple  In  the 
continuously  measured  variable  (criterion)* 
This  formula  Is  based  on  the  principle  that  If  there  is  no 
difference  between  the  mean  criterion  aoore  of  the  group  glvlnj? 
the  ooxreet  response  to  an  Item  and  the  mean  criterion  noore  of 
the  group  glvlnf;  a  wronc;  response  to  the  lten«  there  Is  no  cor* 
relation  between  the  Item  and  the  criterion*  The  larger  the 
difference  between  the  meana  the  higher  is  the  correlation*  The 
pirlnolpal  objeotlon  to  this  method  of  Item  analysis  la  the  tine 
eensxBned  In  sorting  and  computing  mean  criterion  acorca  aeparately 
for  those  jMaslng  and  for  those  falling  each  Item  alnoe  the 
number  and  particular  Individuals  who  paaa  differ  from  Item  to 
Item* 

The  tetraohorlc  correlation  coefficient  la  frequently  used 
In  Item  analyals*  It  assumes  that  both  variables  are  oontlnuoua 
and  normally  distributed  but  are  reduced  artificially  to  two 
categories  each.  Oullford  (10)  listed  as  Its  disadvantages  the 
faet  that  it  la  extremely  difficult  to  compute  by  formula  and 
that  It  la  leas  T«ltable  than  the  Pearson  r  because  of  the  coarse 


grouping  into  only  two  oatagorle**  For  this  rvaaon  it  is  uaa« 
ful  onl7  with  large  aamples.  ConQjuting  dlagraaa  aa;  ba  uaad  to 
affaot  consldarabla  saving  in  tine  whan  a  large  Ronber  of  tetra» 
ohorio  r*8  are  to  be  coiqmtad*  Guilford  (10)  reoonnended  tbe 
Thnratone  oomputing  diagrema  bjr  Oheaire,  Saffir  and  Thorstone 
(6)»  Other  compitting  diagrams  have  been  published  nore  recently 
by  Hayes  (12), 

The  phi  eoeffielent  is  another  statistical  teohnlo  whioh  ia 
used  in  item  analysis*  When  one  or  both  of  the  traits  are 
really  diohotonoaa  it  ia  the  aost  suitable  method  of  oorrelatlon 
aooordlnj;  to  Rtillford  (10) •  He  found  the  ohlef  objection  to  it 
baaed  on  the  fact  that  It  Is  not  slvays  equivalent  to  the 
Pearson  r  and  can  not  be  interpreted  in  the  aame  way*  ?hen  both 
▼arlablea  are  continuous  but  one  la  dichotemously  scoredf  the 
phi  coefficient  ia  amaller  than  the  Pearson  r*  The  phi  co- 
efficient alRO  varies  in  size  aooordlnc  to  the  percentage  of  the 
oases  included  in  the  upper  and  lower  criterion  groups.  These 
disadvantages  are  unimportant  if  only  the  relative  validities  of 
the  items  of  a  test  are  needed  to  evaluate  the  iteas* 

Jtirgensen  (IS)  has  developed  tables  for  determining  phi  co- 
efficients accurate  to  three  places  and  identical  to  those 
obtained  by  formola  If  sub-groups  are  eqtial  in  number*  He 
pointed  out  that  by  the  use  of  these  tables  Item  validities  could 
be  determined  more  accurately  and  quickly  than  by  many  mathoda 
deaigned  to  reduce  computation  time  which  also  sacrificed  aa«a 
efficiency  and  accuracy. 


Tvimbull  (25)  presented  a  noraallsod  grephlo  method  of  Item 
analysis  which  Included  not  only  the  correlation  of  the  Iten  with 
the  criterion  but  also  an  analysis  of  choice  of  responses*  By 
this  nathod  the  students  were  divided  Into  sixths  aooordlng  to 
the  arlterlon  ratings  and  the  percentage  of  students  In  each 
sixth  ohooalng  each  response  was  plotted  on  the  graph.  The  line 
for  the  correct  response  was  expected  to  show  a  considerable  up- 
ward slope  froa  the  lowest  sixth  to  the  highest  sixth  as  Increas- 
ing percentages  of  the  better  students  selected  it  while  the  lines 
for  the  Inoorreot  responses  were  expected  to  slope  downward  from 
the  poorest  to  the  best  studentoa  An  iten  was  considered  in  need 
of  revision  If  the  lines  for  the  responses  did  not  slope  as 
expected*  He  also  stated  that  any  response  whloh  did  not  attraet 
a  coaaiderable  percentage  of  the  stttdenta  should  be  made  aore 
plausible. 

Ety  using  arbitrary  distributions  of  the  criterion  aoorea 
Adkina  and  Toopa  (1)  derived  several  Bodifioatlons  of  the 
Pearson  correlation  coefficient  of  a  diehotonous  variable  with  • 
multlple-eategorled  variable*  Their  for&ulas  slnpllfied  ooii9u« 
tatlon  end  effected  a  considerable  saving  in  tine  without  saorl- 
flelng  accuracy  or  reqxilrlnc  correction  for  coer&e  grouping* 

They  presented  several  formulos  for  different  numbers  of 
oategorles  and  distribution!)  of  the  criterion  but  reconmended  two 
aa  the  most  convenient  to  use*  One  was  en  apprOTclmately  normal 
distribution  with  the  total  number  of  eases  a  multiple  of  1S« 
divided  into  five  oategoriea  with  relative  frequenoiea  of  1«  4| 


if  4  am!  1.     Tt»  ottasr  waa  •  Motan^^ar  distribution  with  tta* 
total  tmntber  of  oases  a  imiltlpla  of  five,  dlvlrled  Into  flra 
oqual  oat«sorlee.     In  olthar  dlstrltnatlon  the  criterion  soores 
were  oodM  STmetrloally  about  aero  as  -if  -l,  0«  1«  and  2,  with 
the  ooded  aeores  aaslgned  to  the  five  categories  in  order  of 
exoellenoe* 

The  fornmla  for  use  with  a  Motangolar  dlatrlhutlon  of  five 
equal  oateirorles  assigned  eoded  soores  as  above  is> 


»xy  ■    ae  +  b  *  d  »  ae 


TTBT 


where  r,-.  •  the  ooefflolent  of  oorrelatlon  between  the  Iten  and 
the  orlteirlon 
a   s  the  nuoiber  of  right  answers  In  the  highest  fifth 
b   •  the  namber  of  right  answers  in  the  second  fifth 
0  9   the  ntmber  of  rlj^ht  answers  in  the  third  fifth 
d   a  the  naid>er  of  right  answers  in  the  fourth  fifth 
e   •  the  number  of  right  answera  in  the  lowest  fifth 
R   •  the  total  noBber  of  rif:ht  answers 
w   •  the  tot el  namber  of  wronc  answers 
This  awthod  was  selected  for  use  in  the  item  enalysis  of  the 
test  in  bioloor  on  the  basis  of  its  freedoai  from  eertein  objection- 
able features  of  the  other  methods* 

Adkins  and  Toops  (1)  also  derived  a  similar  fonula  for 
evaluating  itea  alternatives  by  determining  the  oorreletlon  of 
eaoh  wrong  response  with  the  erlterion*  They  showed  that  all 


10 


mronc   reaponaaa  should  have  low  no-atlve  validity  ooefflolants 
and  that  If  any  wrons  raaponaa  ahowad  an  appraolabla  poaltlva 
validity  ooafflolant  It  ahould  be  revlaad  or  alladnatad.  Any 
raaponaa  »hloh  la  oloaely  anoush  ralatad  to  tha  rl^iht  anawar  to 
attract  a  elcnlfleantly  hlghar  proportion  of  tha  battar  atudanta 
tban  of  tha  poorer  atudanta  matarlally  raduoaa  tha  dlacnoatlo 
▼alua  of  tha  Itam  and  tha  validity  of  tha  oorroot  raaponaa, 

Dataradnlne  tha  difficulty  of  teat  Itana  aa  a  part  of  an 
itam  analyala  la  neoaaaary  for  tha  hant  arranganant  of  Itaraa 
within  tha  taat  and  aa  a  b«ala  for  adjuatlng  tha  difficulty 
Xaval  to  th*  group  to  ba  taatad. 

Arrancanant  of  taat  Itana  in  ordar  of  difficulty,  aaalaat 
Iteisa  flrat^  la  aeeaptad  aa  a  standard  prooadure  in  taat  con* 
atruotlon*  Thla  allova  the  atudanta  to  atart  with  a  faallng  of 
oonfldanoa  and  provldaa  that  If  any  quaatlona  ara  onltted  for 
laok  of  tlna  thay  will  be  tha  moat  difficult  onaa  which  tha 
atudanta  would  have  baan  laaat  likely  to  anawar  correctly* 
Hanry  (18)  auggaatad  that  although  a  few  very  eaay  Itama  at  the 
beslnnlnc  of  a  taat  had  little  validity  value  they  ha<?  another 
value  in  enooura^lnE  the  atudanta*  Ouilford  (10)  alao  approved 
tha  uaa  of  one  or  two  vary  easy  "ahook  abaorbera"  at  tha  be- 
ginning of  a  teat  even  though  they  contributed  nothing  to 
■aaauranant  baoauae  all  the  atudanta  could  paan  than* 

The  difficulty  of  teat  Itama  nay  ba  graduated  more  or  leaa 
ateeply  from  be(:lnnin<'T  to  end  or  they  may  be  nada  up  aalnly  of 
liana  of  average  difficulty*  Guilford  (10)  atated  that  the 
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mxlBiBB  disorlnlnatlon  aaong  tasteea  in  obtaioad  by  Itcrna  tbat 
abottfc  h«ir  th«  IndlvlrtixulB  can  pass.  I«nts,  Hlrshteln  and  Flaoh 
(17)  agr««<1  that  tnis  ruXa  applied  to  many  typea  of  taata  and 
was  a  good  prellnlnary  nothod  of  evaluating  teat  Itama  but  atatad 
that  tlje  r(»le  did  not  hold  for  many  teste  of  skill  and  knowledge. 
SyBonda  (84)  quoted  Thomdyke'a  atatenant  that  "Items  at  any 
level  of  difficulty  are  a  valid  measure  of  an  Individual's 
ability  at  all  levels  In  teaks  hCMOgeneous  In  oonstruotlon  and 
type  of  material."  After  further  study  of  the  probles  Symonda 
(24)  ooaoluded  that  the  best  test  for  neastirlng  a  taoaogeneoaa 
eroup  Is  one  In  whloh  50  per  oent  of  all  Items  ean  be  paased  by 
Individuals  with  median  aoeras  but  that  such  a  tost  would  not 
measure  adequately  the  \q>per  and  lower  extreraea  of  a  hetero- 
geneima  e^^tip*  He  atated  that  the  best  teat  for  a  haterogeneoua 
group  Is  one  eonstnieted  with  Itame  ranglne  evenly  In  dlffl« 
ovilty  fren  those  paaaed  by  BO  per  oent  of  the  atudenta  In  the 
lowwat  section  of  the  group  to  those  passed  by  50  per  cent  of  the 
Individuals  In  the  highest  section  of  the  e«*«P» 

Kenry  (IS)  divided  a  test  Into  easy,  medium  and  hard  Itena 
according  to  the  annber  paaslnr  eaoh  Itaa  and  eogq>ered  the  Item 
validities  for  the  three  groups  of  questions*  He  found  no 
reliable  relationship  between  the  difficulty  of  a  test  Itom  and 
Its  validity. 

In  stndylne  the  relative  merit a  of  different  mothoda  of 
evaluating  test  Itema  !Hrlneford  (2S)  found  a  correlation  of 
-•00&&  between  the  balance  of  right  and  wrong  anawera  and  Iteai 
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valldltlsa  datanalned  by  the  blasrlal  r  when  applied  to  the  aaaa 
data* 

Broaden  (4)  wamad  agalnat  tho  danger  of  deox>ea8ln£  tba 
validity  of  a  teat  by  too  great  narrowing  of  the  range  of  dlffl- 
oiilty  even  t'lough  thla  narrowing  of  ran^e  of  difficulty  might  In- 
oreaae  the  reliability  of  the  teatt 

The  different  atatementa  regarding  level  of  difficulty  any 
be  aaamarlaed  in  the  atatenent  that  Itema  which  are  ao  eaay  that 
they  an   failed  only  by  chance  end  Itens  which  are  ao  difficult 
that  they  are  paaaed  only  by  chance  contribute  nothing  to  aeaaureo 
aenti  tteaa  of  medium  difficulty  are  preferred  by  moat  wrltera« 
but  there  must  be  aufflolent  range  of  difficulty  to  teat  both 
eztremea  of  the  cxoapt 
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DESCRIPTION  OF  tROOSSmiK 


Matarlal 


Tb«  100  Itma  maltlple-oholoe  test  In  biology  was  glv«n  to 
Kansas  stata  College  students  who  enrolled  In  the  couriie, 
Bloloey  in  Relation  to  Han,  in  the  fall  of  1943  and  was  f;;lven 
again  In  the  spring  of  1947  at  the  oonolualon  of  the  two  semester 
coarse* 

The  test  wsa  dlohotomoualy  aooredf  in  that  there  was  one 
response  reoognlaed  as  right  by  authoritative  Judgment  of  the 
faculty  of  the  department*  Any  other  response  or  omission  of  the 
Item  was  soored  as  "not  rlnht".  The  test  was  presented  In  a 
sdmeoRraphed  booklet.  Separate  anawer  sheets  were  used  to 
faollltate  scoring  and  to  permit  the  re-use  of  the  test  booklets* 
The  number  of  alternative  responses  to  eaoh  Item  varied i  63  itema 
had  five  alternative  answers)  55  questions  had  four  responses 
from  whloh  to  ohoosej  and  two  Itema  had  only  three  optional 
responses* 

There  were  608  students  enrolled  in  the  oourse  for  at  least 
<me  semester)  99  were  eliminated  from  the  study  beoause  they 
Cither  were  not  enrolled  the  seoond  semester  or«  aa  seniors^  did 
not  take  the  objective  test  at  the  end  of  the  oourse.  The  num- 
ber was  further  reduced  by  elimination  of  80  students  who  either 
were  not  enrolled  the  first  semester  or  who  talssed  the  prellalnery 
test*  Beoauae  the  Toops-Adkins  method  required  that  the  number 
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of  sttidenta  be  a  aultlple  of  five  and  It  waa  daalrabX*  to  avoid 
furtVtar  arbitrary  r«duatlona  of  tha  raxaber^  one  fltudent  who  had 
gradea  for  both  aemastera  but  had  mlaaed  tbe  first  test  «aa  in« 
oludod  In  the  atudy*     Thla  made  the  maaber  of  attidcnta,  upon 
whone  records  oonputatlona  were  baaed,  exaotly  400,  exoept  In  the 
correlation  of  the  prellinlnai7  test  soores  with  (;rade8  for  which 
data  regarding  only  393   stttdents  were  available* 

Reliability 

The  reliability  of  a  test  Is  defined  by  Llndqulst  (13)  aa 
Its  self^oonalatenoy*  A  perfectly  reliable  test  would  be  one  free 
frcn  errora  of  neasurenent  so  that  auoeeaslve  meaaurementa  of  the 
mu»   Individuals  or  phenonena  would  yield  exactly  the  aame  values* 
Althortgh  no  perfectly  reliable  test  of  payoholocloal  functions 
exists.  It  la  essential  to  know  how  reliable  a  test  Is,  be« 
cause,  aa  Bln^iham  (S)  haa  pointed  out,  no  teet  oan  have  a  validity 
coefficient  greater  than  Ita  reliability  coefficient,  ne  haa 
also  shown  that  no  test  ean  have  a  greater  validity  eoefflolent 
than  the  roHablllty  coefficient  of  the  orlterlon  with  which  It 
Is  correlated  although  thla  la  often  difficult  or  virtually  Isi- 
possible  to  determine. 

There  are  three  tredltlonal  nethoda  of  deterailnlng  the  reli- 
ability of  a^y  test  accordlne  to  Llndqulst  (19)  and  Cullfei^  (10)* 
These  are  the  test»retest  netliod,  the  altemate-foraa  awthod  axid 
the  split-half  method*  More  recently  the  Kuder-Rlohardsen 
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foroola  h»»  t>«en  added  as  a  aathod  of  determining  teat  rell« 
ability  and  apeelal  studlea  of  the  apllt»half  method  have  been 
Bade  by  (Mttiian  (11)«  Cronbaeh  {7$  B)   axtd  others  In  an  attenpt  to 
oall  attention  to  Its  weakneases  and  to  refine  the  iMthod* 

Qallford  (10)  otatee  that  the  prlnolpal  objectlcm  to  the 
teat-retest  method  la  found  In  the  Individual  dlfferenoea  In 
learning  during  the  tine  between  the  two  teata  and  In  Indivi- 
dual dlffereneea  In  learning  froa  the  praotloe  effeot  of  the  flrat 
test.  The  uee  of  thla  method  to  detexnlne  the  reliability  of  the 
biology  test  worild  have  been  of  no  value  beoause  In  the  time 
Interval  between  the  test  and  retest  all  the  students  had  two 
aeneatera  of  study  of  bloler^  whleh  resulted  In  ehoncies  In  Indivi- 
dual eeorea  nd  relative  standings* 

The  ohlef  weoleness  of  the  alt emate-f ones  method  aeoordlag 
to  Oullford  (10)  la  the  faot  that  many  tests  do  not  have  alter- 
nate equivalent  fonsa  and  If  there  are  alternate  formnj  Indivi- 
dual dlfferenoea  In  learning  fron  the  preotloe  effeot  of  the 
first  test  may  create  dlfferenoea  not  due  to  errors  of  aeaaure- 
Mnt*  Thla  method  oonld  not  b«  uaed  with  the  biology  teat  be- 
ea\iae  there  wae  no  alternate  form. 

Crltlelsra  of  the  split-half  method  Is  besed  on  the  faot 
that  there  are  iMmy  different  poss'ble  splits^  no  one  of  whloh 
ean  be  eald  to  be  the  only  oorroot  one  on  whloh  to  base  an  esti- 
mate of  the  reliability  of  the  test.  Cronbaeh  (7)  reported  mak- 
lae  thirty  random  splits  end  fourteen  parallel  splits  of  a  86 
item  sUent  reading  teat  without  secitrlng  any  two  Identical  roll- 
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ability  ooefflolenti  whan  o«rrl«»l  to  thre«  deolsial  plao«i*  Hit 
purpose  was  to  •t^idy  the  spllt-h«ir  method  rather  than  to  atudy 
the  partlotaar  test  used  and  therefore  he  atteapted  to  make  everjr 
oonoelrable  split  of  the  data,  He  did  not  find  parallel  epllta 
auperlor  to  random  splits  In  •  )idBOe;«neo>u8  test.  He  oonoliided 
that  in  determining  the  reliability  ooefflolent  of  any  teat  by 
the  split-half  nethod  at  least  two  apllte  shcild  be  Made  and  that 
the  aesns  and  atandard  devlatlone  of  each  half  should  be  reported 
In  addition  to  the  ooefflolent. 

The  split-half  MStltod  aasmnea  that  the  teat  Is  divided  Into 
two  equivalent  halves  and  determines  the  correlation  between  the 
two  halves,  ^Inoe  this  method  yields  the  reliability  coefficient 
for  a  teat  Just  half  the  length  of  the  original  teat  the  Spear- 
«BB«nrown  formula  Is  used  as  a  eorrectlon  to  determine  the  reli- 
ability ooefflolent  of  the  full  length  teat.  This  fonmila  la 
deaorlbed  by  Onllford  (10)  who  calls  attention  to  the  fact  that 
Its  use  requires  that  the  two  halvea  amst  have  equal  standard 
deviations, 

OnttMBn  (11)  has  contributed  a  fomola  for  the  apllt-half 
reliability  ooefflolent  which  Is  not  dependent  upon  equal  atandard 
deviations  of  the  two  halves.  He  also  recocnlaed  the  variability 
of  the  reltablllty  coefficient  and  propoaed  that  reliability  be 
described  In  terns  of  "lewor  and  upper  bounds"  rather  than  as  a 
precise  ooefflolent . 

Cronbach  (8)  has  divided  the  concept  of  reliability  Into 
four  different  definitions  which  he  classified  as  the  hypothetical 
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a«lf-oorrelatlon,  the  ooefflelent  of  •quivalenoe,  the  ooeffl- 
ol*nt  of  8t«bllity  ■nd  the  coefficient  of  equivalence  end  stabi- 
lity* He  atated  that  the  teat-retest  iMthod  yielded  the  ooeffl- 
elent  of  atabllltyi  the  alternate-forms  method  produced  the  oo- 
•fflelant  of  equivalence  and  stahlllty;  the  epllt«half  method 
and  the  Kudar-Mohardaon  foreaila  yielded  ooefflolonta  of  equiva- 
lence and  hypothetical  aelf-oorrolatlon  was  obtained  by  the 
Oitttmn  fonnulat  He  atated  that  thera  was  no  alngle  best  estl- 
mate  of  the  reliability  of  a  taat;  that  all  four  were  valuable  In 
studying  a  teat  but  that  thay  wera  not  lnterehane«able  and  know- 
ledge of  the  method  uaad  waa  esaentlal  to  the  Interpretation  of 
tha  reliability  coefficient, 

Hlchardson  and  Klider  (21)  derived  new  nathoda  of  eatlmatlng 
teat  reliability  ooefflolenta  based  upon  rational  equivalence  to 
ellialnate  the  problaas  of  obtaining  ooioparable  halvea  and  of  da- 
temlnlng  which  of  several  equally  acceptable  methoda  of  dividing 
the  test  Into  halvea  should  he  uaad.  Crcmbaoh  (7,  0)  and 
CuttBsan  (11)  asraa  that  the  Kuder-Rlohardacn  formula  la  a  eon- 
eervatlve  aatlmate  of  reliability  and  while  It  nay  undereatlmata 
the  reliability  of  a  test  It  will  not  ovaroatliaote  It,  Several 
variations  of  the  Kuder-Rlohardson  fomola  have  been  devised  to 
give  e  shorter  approximation  of  the  reliability  coefficient.  The 
baalo  Xudor-niohardson  formula  as  described  by  r.ullford  (10)  waa 
chosen  aa  the  method  for  estimating  the  reliability  of  the  biology 
test,  Thla  formula  lai 
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trti«re  r  ■  reliability  coefficient  for  the  whole  test 
n  s  nvmber  of  Items  In  the  teat 
0^:  atenderd  deviation  of  the  total  test  soorea 
p  •  proportion  of  the  growp  passing  an  Itwii 
q  •  proportion  falllns  to  pasa  the  Itm 
The  reliability  coefficient  of  the  100  qnestlon  biology  test 

•■  determined  by  the  Ktider>Rlchardson  fonaola^  described  above* 

was  aSSS* 

Validity 

Validity  haa  been  defined  by  tlndqnlst  (IS)  as  the  aoenracy 
with  which  a  test  measures  thKt  which  It  Is  Intended  to  meaanre. 
It  la  expressed  as  the  coefficient  of  correlation  between  the 
total  teat  soore  and  a  criterion  and  nay  be  need  to  predict  cri- 
terion scores  for  other  persons  from  the  saaa  population  whose 
teat  scores  are  known  but  whose  criterion  retlngs  are  not  known. 
Ctnllforl  (10)  has  pointed  out  that  regardless  of  wh'»t  a  teat  la 
Intended  to  ateasure.  It  Is  b  vslld  teat  for  any  sphere  of  be» 
havlor  In  which  It  makes  prediction  of  behavior  posnlble.  There- 
fore, no  teat  nay  be  seld  to  have  a  nla^le  validity  coefficient 
aa  any  stateaent  of  validity  depends  upon  the  criterion  used  to 
detemlne  the  predictive  value  of  the  test* 

Qradea  In  the  ooTtrse,  Hlolo;^  In  Belatlon  to  Man,  were  used 
aa  the  criterion  for  determining  the  validity  of  the  biology 
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t«tt«  ValtMS  of  C,  4,  3,  8  and  1  w«r«  aaelcnod  to  the  letter 
gwdee  of  A>  B,  C,  D  and  P  reapeotively  for  each  aesMater* 
Oradea  for  each  atutdant  for  the  two  seBieatera  were  oontolned, 
glYlng  a  range  In  ooJBposlte  gradea  from  10  for  th»  atudenta  with 
two  A* a  to  3  Tor  the  atudenta  with  one  V  and  one  F. 

TIM   Fearam  produot»no»ent  foxtnila  waa  applied  to  deteriBlne 
the  ooelflolent  of  oorrelatlMi  of  aoorea  on  the  teat  given  at 
the  oonolualon  of  the  oouree  with  the  ooqpoalte  gradea  for  tha 
two  eeaeatera*  A  validity  ooefflolent  of  •624  wan  obtained, 
which  has  a  predictive  value  of  31,9  per  o«»t  better  than  ohanoe 
aooordlng  to  tablea  aupplled  by  Binghaa  (S}»  The  «tudenta» 
aoorea  on  thia  objective  teat  wore  not  uaed  In  detemining  the 
letter  gredea  for  tha  oourae  ao  aelf -correlation  did  not  in- 
ereeae  tha  validity  coefficient. 

Oullford  (10),  aiagham  (2)  and  othera  emphaalie  tha  iapor- 
tanee  of  aeoiirlnt  adequate  criteria.  They  regard  thia  aa  one  of 
tha  Hoat  difficult  aapaota  in  the  validltion  of  teata.  Blaghaa 
(8)  indicatee  that  failure  to  aeoure  a  hl^sh  eoeffiolant  of 
validity  for  a  teat  ia  due  not  only  to  tha  lack  of  validity  and 
reliability  of  the  teat  itself,  but  alao  to  tha  laok  of  either 
perfect  roliablllty  or  validity  of  the  criterion.  In  deter- 
ndning  the  validity  of  the  biology  teat,  aeaeater  e»d««  auppXiad 
a  ooarae  grouping  of  criterion  aoorea  which  could  have  been  Im- 
proved if  the  aoorea  on  the  teata  which  aede  up  the  letter 
gradea  for  the  oourae  had  been  available  to  permit  siore  precise 
grouping*  Since  both  the  biology  teat  and  gradea  In  the  couraa 
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w*re  Intended  to  raeeaure  baalo  kno«lodg»  In  biology  the  oor- 
relation  coefficient  of  •G24  of  gredes  for  the  course  and  test 
aeorea  at  the  end  of  the  co\irae  auggesta  that  the  test  la  a 
fairly  valid  meaaure  of  baslo  knowledge  In  tilology. 

The  poaslblllty  of  predicting  gradea  In  the  oourae  by  aoerea 
on  the  teat  given  before  atartlng  the  course  was  Inveatleated  by 
ootBputlng  the  Peerson  pi-oduct«^ioinent  coefficient  of  correlation 
of  oempoalte  gradea  for  the  two  aenestera  and  test  aoorea  at  the 
beginning  of  the  course,  A  ooefflolent  of  cSS  was  obtained  which 
has  a  predictive  value  of  5»2&  per  cent  better  than  chance 
aooordlns  to  tables  supplied  by  BlnGham  (2}«  A  eurvey  of  the 
dlatrlbutlon  of  scores  8ugt,oBted  that  the  limited  value  of  the 
test  for  predicting  gradea  when  given  before  the  oourae  did  not 
reflect  a  weakness  of  the  teat  so  Diaoh  as  It  Indloated  that  baalo 
knowledge  of  biology  prior  to  the  course  la  not  essential  to 
success  In  the  course.  The  number  of  students  who  had  low 
grades  on  tlie  yrollulnary  test  but  received  high  grades  In  th« 
oooree  auggeata  that  good  stvidenta  could  succeed  In  the  oourae 
without  previous  training  In  biology*  Thla  conclualon  la 
supported  by  the  fact  tiiat  no  prerequlaltea  are  required  for  the 
course • 

Oeaiperlson  of  Itea  Valldltlea 

The  validity  of  each  Iteai  aa  expressed  by  Its  correlation 
with  the  total  test  soere  was  computed  by  the  Toops-Adklna 
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■•thod*     Th««e  valldii;^  ooe'riclento  srs  siiovm  in  Table  1*     7b»f 
raoe^  iToa  -^039  to  ,50^,     Five  of  the  TSltdlty  eo«fflolenta 
vara  above  tiOOi  12  were  betvrean  tSOO  and  «400t  and  49  vers  b«* 
twaen  «203   and  «330. 


K«7  to  Tat)l9  1 

a  «  nuaiber  of  rtght  roaponsea  chosen  by  studanta 

In  highest  fifth 
b  -  numbar  of  right  reaponses  choaan  by  atudanta 

In  soeond  fifth 
o  •>  numbar  of  rl{;ht  i^asponaaa  ohoaan  by  atudanta 

In  third  fifth 
d  •  mmbar  of  right  raaponaas  ohoaan  by  atudanta 

In  fourth  fifth 
a  •  numbar  of  rl^-ht  raaponaas  chosen  by  attidants 

In  loweat  fifth 
R  -  total  rlcht  raaponaas 
Vt  »   total  wrong  raaponaas 
8It*  •  produot  of  right  and  wroa^   anawars 

Httltlplled  by  S 
'VSRV  •  danomlnator  of  Toops»Aclklns  formula 

T  -   oorralatlon  ooafflclant  of  Item  and  orltarlem 
Sa-t-bvd-Ze  -  nunarator  of  ToopsoAdVclna  fonnula 
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Order  of  Difficulty 

Th«  arraiij-enant  of  Items  In  rank  order  of  difficulty,  oaal- 
est  itama  first.  Is  shown  In  Table  2  with  the  validity  coefficient 
for  each  item* 

Because  aome  difference  of  opinion  wea  found  among  other  wrl- 
tara  regarding  the  relationship  of  Item  validity  and  item  diffi- 
culty, their  relationship  in  the  biology  test  wan  studied*  After 
the  teat  itema  were  ranked  in  order  of  difficulty  they  were  also 
ranked  in  order  of  validity*  The  Speazisan  fonnala,  as  given  by 
Kelley  (16),  for  rank  order  correlation  was  applied  and  a  rho  co» 
efficient  of  .08  waa  obtained*  A  correction  Is  neeeaaary  to  make- 
the  Rpeaman  rho  stx^otly  comparable  to  the  Pearson  r.  This  ooi^ 
rectlon  was  made  aocordinf;  to  a  table  supplied  by  Oullford  (10)* 
The  corrected  coefficient  waa  *084*  The  atandard  error  of  rho 
compiited  according  to  the  fonmila  given  by  Guilford  (10)  was  tlOS. 
At  the  one  per  cent  level  of  confidence  the  Units  of  the  true 
rho  are  ••192  and  *352.  This  coefficient  of  correlation  suggeata 
that  thare  is  no  significant  rectilinear  relationship  between 
Item  validity  and  item  difficulty  in  the  biolory  teat* 

The  possibility  of  a  curvilinear  j^elationshlp  was  investl- 
geted  by  the  oritieal  ratio  technlo*  The  mean  Item  validity  of 
the  group  of  queatlons  composed  of  the  25  eaaleat  items  and  the 
25  most  difficult  itena  was  .IS!}  aa  compared  with  the  aean  Itera 
validity  of  .245  of  the  50  questions  of  aediuin  difficulty*  The 
atandard  error  of  the  difference  between  the  meana  waa  *0195 
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Choice  of  Hoaponses 


The  oholae  of  optional  raaponaaa  mia  enalyaad  for  tba  pvir- 
poa*  of  diaoovarlng  poaslble  revlalona  to  Inoressa  Item  validity. 
TMa  analyala  la  ahown  in  Table  S»  The  Toopa-Adklna  laethod  (1) 
«aa  applied  to  determine  the  validity  of  eaoh  i^aponaa* 

ftltematlTaa  *hlch  were  ehoaen  by  none  or  aliaoat  none  of  tta0 
atudenta  atiOTild,  aa  Tumbull  (25)  reaosananded,  be  made  more 
plaualble  beoauae«  If  thay  do  not  attract  any  one,  they  con- 
tribute nothing  to  the  teat,  .4  r>ood  example  of  thla  la  found  in 
Item  40,  a'here  230  atudenta  ohoae  reaponae  "a")  157  ohoae  ra- 
aponae  "b",  the  right  anavert  ten  atudenta  onltted  the  Item;  and 
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only  thz>e«  wer«  attrsotod  to  any  oi   the  othar  thr«e  wrong  r«» 
aponiiea.  Even  thovigh  five  optional  responsca  wore  offered.  It 
««a  In  effect  a  two  renponse  Item*  VThere  poaslblo  the  quaatlona 
with  leas  than  five  optlona^  auoh  as  S6  and  44,  ahould  be 
lencthaned  to  make  tha  teat  uniform,  but  offering  additional 
optlona  would  not  serve  the  dealrod  purpoaa  unleaa  they  could  be 
nad«  plausible  enough  to  attract  aoraa  of  the  studenta* 

Tha  principle  of  waking  unuaod  reaponaaa  more  plausible 
might  alao  he  applied  to  Itema  whloh  had  low  validity  becauae 
nearly  all  the  atudenta  ohose  the  right  anawer. 

Aooordlng  to  Adklna  and  Toopa  (1),  the  rleh*  anawer  titoold 
have  a  poaltlve  validity  coefficient  aa  high  aa  poaalble,  and 
the  wrong  reaponaaa  ahould  have  low  or  negative  volldlty  oo« 
efflclenta*  Any  wrong  reaponae  whloh  haa  an  appreolable  poaltlve 
validity  ooefriolent  beoauae  It  attraota  a  larger  proportion  of 
the  bettar  atudenta  than  of  the  poorer  atudenta  should  be  re«> 
vlaed*  Exttnplea  of  auoh  reaponaaa  are  found  In  17  e^  23  o,  95  o, 
and  37  d«  Such  lt«na  are  not  valid  If  a  wrons  reaponae  la 
BlMllar  enoxjGh  to  the  rl.rht  answer  that  It  attraota  greater 
numbara  of  the  bettar  atiidenta  who  have  aoma  knowledge  of  the 
aubjaot,  than  of  the  poorer  atudenta  who  divide  their  oholoaa 
B»r«  evenly  awong  the  other  wrong  anewera. 

The  poaalblllty  of  ohanglns  the  wording  of  the  Iten  ahould 
bo  oodisldered  In  oaaoa  where  an  unusually  large  mmber  of  all 
student-!  crr.l'jtetl  the  iter..  An  er?.m-,ir,  tn  Tten  11  wMc*>  was 
omitted  by  more  than  half  of  the  atudenta,  Chan/rlnt  the  wording 
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Bight  also  laprov*  itcraa  whloh  •••nMt  too  difficult  b«oaua«  ao 
faw  anawarad  tham  oorraotlyt  Rxtoh   (22)  haa  raootntandad  ths  uaa 
of  "stnpla  evaryda?  words  In  praferanoa  to  awra  taohnloal  or 
lltarary  sTnonyna."  An  axanpla  of  thla  typa  nay  ba  found  In 
Ita»  34  which  could  probably  ba  Improved  by  aubstltutlng 
"ohaae««*  '»*"  **>e  kay  word  "tranafonns", 

Paint  or  lllof^lble  alDaographln^;  of  tha  rlcht  anawar  may  hava 
bean  a  oontrlbutlng  factor  in  the  low  validity  of  Itan  95. 

Tha  tandanoy  of  studanta  not  to  read  everything  la  well 
llluatrated  by  tha  reaponaea  to  Itan  1«  Thla  Iten  with  tha 
correct  answer  waa  given  aa  an  llluntratlon  on  the  Inatruotlon 
ahaat«  but  in  aplte  of  this  it  ranked  twelfth  in  difficulty  and 
waa  aiiaaad  by  aore  of  the  good  atudonts  than  of  the  pooz>er 
students* 

Rnoh  (22)  reooamended  placing  the  correct  raaponaa  In  each 
poaltlon  an  approximately  equal  nuiaber  of  tinea*  In  tha  biology 
teat  the  correct  reaponae  appeared  as  "a"  17  tinea^  aa  "b"  2X 
tinea,  tit   "c"  24  tixaaa,  aa  "d"  24  times  and  as  "a"  14  tinea*  Aa 
an  optional  response  "e"  waa  offered  with  only  66  itena  that 
response  was  the  ootreot  one  In  a  fair  px<oportlon  of  itomsf  but 
tha  17  tinea  the  corraat  answer  wes  placed  in  position  "a"  waa 
leas  than  should  have  oocured  in  a  chance  diatributlon* 

Ruch  (22)  also  recoaaitandad  that  the  aame  response  should  not 
appear  in  the  aame  poaltlon  more  than  two  or  three  suoceaalva 
times*  This  was  violated  only  once  in  the  biology  teat  whan 
"o*  waa  the  correct  reaponae  to  Items  40,  47,  4Q,  and  49. 


K«y  to  Table  S 

Horlaontal  nandlnc* 

I   -  Item  number  In  test 

Clu  •>  Optional  oholoea 

A   •  Choloaa  by  hlgbeat  fifth  of  atudants 

B   •  Choioea  by  aeoond  fifth  of  atudenta 

C       •>  Choloaa  by  third  fifth  of   student* 

D      •>  Choloea  by  fourth  fifth  of  atiidanta 

E   •  Choloaa  by  loweat  fifth  of  atudenta 

r   -  Validity  of  reaponaa  by  Toopa-Adklns  laethod 

Vertical  Code 

a  •  Optional  answer  "a" 
b  -  Optional  anawer  "b" 
0  -  Optional  anawar  "o" 
d  >  Optional  anawer  "d" 
e  -  Optional  anawer  "e" 
o  «  Onlaalon  of  choice  of  answers 
-  Optional  answer  not  offered 
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Rtrvlsed  T«it  of  Sixty  Itena 

For  •xperlaental  pijrpoa«a  the  60  Items  having,  the  hlghaiit 
validity  ooefflolenta  were  eelected  and  all  answer  sheets  were 
scored  again  on  the  baala  of  theao  Itema  only. 

The  validity  of  this  60  question  teat  was  determined  by  ooio* 
putlng  the  Pearson  produotomoment  ooefflolent  of  correlation  of 
test  soores  and  grades  for  the  two  semesters  in  BlolOf^y  in 
Relation  to  Man.     The  obtained  validity  coefficient  of   .653  had 
a  predictive  value  of  24.53  per  cent  efficiency  according  to 
tables  supplied  by  Bingham  (2).     This  validity  coefficient  was 
significantly  higher  than  the  coefficient  of   .624  for  th«  en- 
tire 100  Item  test.     The   standard  error  of  the  difference  be« 
tween  the  two  r* s  was   .0096  which  yielded  a  critical  ratio  of 
3»S7  indicating  that  there  were  only  14  chances  In  1000  that  the 
difference  was  due  to  sampling  error. 

The  fact  that   the  correlation  with  an  Independent  cri- 
terion was  significantly  Increased  by  the  elimination  of  Items 
having  low  validity  by  the  criterion  of  Internal  consistency 
suggests  that   some  of  the  criticism  of  the  method  of  Internal 
consistency  Is  not   appllcnble  In  all  oases. 

The  Pearson  px*oduct -moment  correlation  coefficient  between 
the   scores  on  the  100  Item  test  and  the  CO  Iten  test  was  .953. 

The  reliability  coefficient  of  the  60  question  test   as  de- 
termined by  the  Kudei—Fdc'nardson  fomnila  was   «BSB  as  compax^d 
with  .332  for  the  100  question  test  by  the  same  method.     Oull- 
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ford  (10)  has  pointed  out  that  thei><;  Is  an  Inoreasa  In  r«llsblll- 
ty  with  an  Inoirease  In  th«  Isngth  of  a  test.  Therefore,  tha 
slight  Inoreasa  In  reliability  In  spite  of  the  reduction  In 
lenfth  of  the  test  was  significant •  7he  Tipeannan-Brown  prophecy 
foxvula  as  clvon   by  Guilford  (10)  Indicated  that  a  test  of  100 
Itena  homogeneous  with  the  60  Items  would  have  a  reliability  co- 
efficient of  ,696, 

The  more  reliable  a  valid  teat  beoomea,  the  higher  its 
validity  coefficient  may  be  expected  to  be  If  other  variables 
reiwln  the  same*  Therefore,  Inoresslng  the  length  of  a  teat  with 
homogeneoua  Items  may  Increase  Its  validity  coefficient.  Por- 
mulas  for  estimating  the  validity  coefficient  of  a  test  when 
lengthened  are  given  by  Bd^erton  end  Toops  (9),  Oullford  (10)* 
Kelley  (16)  and  Lindqulst  (19).  Kdeerton  and  Toops  (9)  also 
furnished  tables  from  which  the  new  validity  and  reliability  co- 
efficients of  a  test  aay  readily  be  oonputed  when  a  test  of 
known  validity  and  reliability  Is  increased  by  two  to  15  tinea 
Ita  length.  All  the  fonmilas  are  based  on  the  ssme  principle 
and  yielded  a  validity  coefficient  of  .678  for  a  100  Item  test 
oonslatlng  of  Items  homogeneous  with  the  60  item  test,  as  eon- 
pared  with  the  validity  coefficient  of  .656  f or t  be  60  item  test. 

neduotlon  to  a  60  item  teat  was  not  reoonmended  but  re- 
vision of  the  optional  responses  to  some  Items  and  substitution 
of  new  items  for  some  others  to  maintain  the  100  Item  length 
was  suggested. 
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smoiAiar  ard  conolusiohs 

A  100  Item  objective  test  in  biology  waa  taken  by  Kanaaa 
State  College  atudents  at  the  beginning  of  the  ooume.  Biology  In 
Relation  to  Kan,  end  again  at  the  end  of  the  two  semester  course. 
An  Item  analyslB  of  the  test  waa  made  to  obtain  Information  for 
use  In  the  refinement  of  the  teat  before  final  validation  and 
standardisation* 

The  reliability  ooeffleient  of  the  test  by  the  Kuder- 
Rlohardson  foiTirola  was  found  to  be  (SSS. 

A  validity  coefficient  of  ,024  for  the  teat  was  obtained  by 
correlation  of  the  test  sooree  with  grades  for  the  two  aemeatera 
In  the  course • 

The  7oops-Adklna  method  of  Item  analysis  was  used  to  deter- 
alne  the  validity  of  each  Item  by  its  correlation  with  the  total 
test  soore«  Item  validities  ranged  fz-om  -tOSP  to  (SOI* 

The  relationship  of  item  alldlty  and  Item  difficulty  was 
Investigated,  The  mean  validity  of  items  of  medium  difficulty 
was  signlfloantly  hlpher  than  that  of  the  extremely  easy  or  ex- 
tremely difficult  questions. 

The  sixty  Iteme  with  the  highest  validities  were  selected 
and  all  anawer  aheeta  were  reseored  on  the  basis  of  these  items 
only.  The  validity  coefficient  of  the  60  queation  test  obtained 
by  conflation  with  ijradea  was  ,656  which  was  significantly 
higher  than  the  validity  coefficient  of  .624  for  the  total  100 
queatlon  test.  The  roliablllty  ooefflolent  by  the  Kuder« 
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Rlahardson  method  of  the  60  Item  teat  was  tSSS  which  was  slight- 
ly  higher  than  the  reliability  ooefflolent  of  .332  for  the  100 
Item  test*  The  rpearmen-3ro«n  formula  Indicated  that  a  100  Iten 
teat  consisting  of  Items  homogeneous  with  these  60  questions 
would  have  a  reliability  coefficient  of  aSSSa  The  validity  oo«> 
efficient  of  a  100  Item  test  homogeneous  with  the  60  item  tcstf 
according  to  a  fomula  f^lven  by  Guilford  (10)  was  estimated  at 
•678. 

The  choice  of  responses  to  all  Items  was  analysed  by  the 
Toops-Adklns  method  aa  a  basis  for  Improveinent  of  Items.  Re> 
vision  of  test  items  was  reoosBMnded  by  either  eliminating  or 
making  more  plausible  the  responses  which  ware  chosen  by  few  or 
no  students.  Revision  to  reduce  the  similarity  to  the  right 
answer  or  elimination  of  the  response  was  recommended  in  oases 
where  a  wi^ng  response  had  a  relatively  high  validity  co- 
efficient. 

The  teat  aa  a  whole  met  minimum  standards  as  to  reliabil- 
ity and  validity  but  item  analy!>ls  showed  that  It  oould  be 
•Ignlfioantly  Improved  by  revision  of  optional  renponses  to 
certain  items  and  ellminatlMi  of  other  Items  of  low  validity. 
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